Method of authenticating a plurality of files linked to atext document

ABSTRACT

A method of authenticating a text document with links to a plurality of files by modifying at least a selected attribute of invisible characters on a plurality of inter-word intervals of the text document, this method comprising the steps of computing (step  10 ) a one-way hash function of each file in order to obtain a hash value composed of a subset of hash digits for each one, encoding (step  16 ) each subset of a plurality of subsets of space characters in the document by replacing in each subset of space characters, the value of the selected attribute for each space character by a corresponding encoded hash digit of each subset of hash digits corresponding to each file, computing (step  18 ) the electronic signature of the encoded text document by using a public-key algorithm composed of a subset of signature digits, and encoding (step  20 ) another subset of space characters in the encoded document by replacing the value of the selected attribute for each space character by a corresponding encoded signature digit.

TECHNICAL FIELD

[0001] The present invention relates to the methods of embedding theintegrity information of a text document and of the files which arelinked thereto in an invisible manner, and relates in particular to animproved method of authenticating the text document and the linkedfiles.

BACKGROUND

[0002] With the increasing use of open networked environments, such asthe Internet, the demand for more secure systems for transferring sharedinformation among networked computers has correspondingly increased.Today, the most serious risk associated to electronic informationexchange on open, unsecured, networks, particularly on the Internet, isthat digital data may be much more easily modified than ever before.

[0003] Most of today's transactions on the Internet, involve the accessby the user to files on Web servers or mail servers directly fromtextual documents. On those open, unsecured networks, when a userselects and triggers an hyperlink on a Web page from a Web browser, orwhen a user clicks on the icon of a file attached to a received e-mail,it is becoming of the out most importance to authenticate the receiveddata files prior to using them as intended. Such data files may include,but are not limited to, computer programs, text, graphics, pictures,audio, video, or other information that is suitable for use within acomputer system.

[0004] By way of example of those security concerns, if an e-mailincludes an attachment to an executable file or software program, theuser may wish to be sure that it has been sent by a trustworthy partyprior to exposing his computer system to a program file that mightinclude a “Trojan Horse” or that could infect the user's computer with avirus. Thus, when a user on the Internet receives data from a server orfrom another user, it may be necessary for the receiving user to verifythat the data received has not been corrupted or otherwise altered insome manner. Furthermore, the receiving user may need to verify that thedata received was actually sent by the proper sending user rather thanby an impostor.

[0005] To improve the security of data transmitted over computernetworks while preventing for digital forgeries and impersonations,document authentication and signer authentication safeguards are beingutilized.

[0006] Nowadays, digital signatures are the main cryptographic toolsemployed to provide document and signer authentication and integrityverification. Digital signatures are basically mechanisms through whichusers may authenticate the source of a received data file. Digitalsignatures achieve these results through cryptographic-key basedalgorithms, the security in these algorithms being based on the key (orkeys), not in the details of the algorithm. In fact, the algorithms maybe freely published and analyzed.

[0007] There are two general types of key based authenticationalgorithms well known in the art: symmetric and public-key. On symmetricalgorithms the encryption key and the decryption key are the same andmust be kept in secrecy by both parties, the sender and the receiver. Onpublic-key algorithms digital signatures are derived through the use of“public keys”. Public key algorithms, also called asymmetric algorithms,are designed for using two different keys, so that one key, used forsigning, is different from the second key, used for verification. Thosealgorithms are called “public-key” algorithms because the verificationkey can be made public. In contrast, the signature key needs to be keptsecret by its owner, the signer. By the properties of cryptographicdigital signatures there is no way to extract someone's digitalsignature from one document and attach it to another, nor is it possibleto alter a signed message in any way without the change being detected.The slightest change in the signed document will cause the digitalsignature verification process to fail. Furthermore, the signing keycannot, in any reasonable amount of time, be calculated from theverification key.

[0008] Thus, using digital signatures involves two processes, oneperformed by the signer, which is the generation of the digitalsignature, and the other by the receiver of the digital signature, whichis the verification of the signature. The signer creates a digitalsignature for the document by using his private signing key, andtransmits both, the document and the digital signature to the receiver.Verification is the process of checking the digital signature byreference to the received signed document and the public verificationkey.

[0009] In practical implementations, public-key algorithms are often tooinefficient to digitally sign long documents. To save time, digitalsignature protocols (i.e., RSA, DSA) are often implemented with secure(one-way) hash functions. Basically, instead of signing a completedocument, the signer computes a hash-value of the document and signs thecomputed hash. Many signature algorithms use one-way hash functions asinternal building blocks.

[0010] A hash function is a function that maps a variable-length inputstring (i.e. a document) and converts it to a fixed-length outputstring, usually smaller, called a hash-value. The hash-value serves as acompact representative image of the input string. Computing a one-wayhash function usually does not require a key. As such, when the documentis received, the hash function may be used to verify that none of thedata within the document has been altered since the generation of thehash function. Thus, hash functions are typically limited in that theuser may not necessarily infer anything about the associated data file,such as who sent it. In order to preserve the non-repudiation andunforgeability properties of digital signatures, when used inconjunction with a hash function, the hash function needs to becollision resistant. That is, it must be computationally unfeasible tofind two messages for which the hash maps to the same value.

[0011] For authenticating a document that includes a plurality ofattachments or links to other files, not only the document, but all thefiles that are linked to it must be authenticated. To deal with thosevery frequent cases, typically a single digital signature is generatedby applying the digital signature algorithm to an aggregate of thedocument and all the files attached. When such signed document andattached files are received, the verification algorithm must be alsoapplied to the same aggregate of the received document and attachedfiles.

[0012] Now, the process of signing and verifying, and/or generating hashfunctions places an additional overhead on sending and receivingcomputational resources. Particularly, when a user receives a documentthat contains many attachments to large files, the verification of theaggregate of the received document and all attached files would imply atremendous burden on the receiving computer resources and unacceptabledelays on such a computer network environment.

[0013] In the prior art, there are methods for efficiently securing andverifying the authenticity of a plurality of data files, such as datafiles intended to be transferred over computer networks. Those methodsfor verifying the authenticity of groups of data files involveproviding, along with the group of data files, a separate signature filewhich includes individual check-values for all data files (e.g.,hash-values) as well as a digital signature for the group. The digitalsignature of the group of files is then verified using a computersystem, and check-values in the signature file are compared with thecorresponding values computed from the data files using the computersystem. This class of methods that generate a separate signature filefor groups of data files is represented by the approach described inU.S. Pat. No. 5,958,051.

[0014] Obviously, all those methods that assume the addition of checkinginformation to a separate file have the drawback of indeed separatingchecked and checking information (i.e., the signature file). Thus, thelatter can easily be isolated and removed intentionally, in an attemptto cheat, or accidentally just because the intermediate pieces ofequipment or the communication protocols in charge of forwardingelectronic documents and data files are not devised to manipulate thisextra piece of information. Then, when authenticating a document havingfile attachments or links to other files, the checking information ofthe document and all attached files should rather be encodedtransparently into the body of the document itself (i.e., in a mannerthat does not affect document's text format and readability whatsoever),so that it would remain intact across the various manipulations it isexposed to on its way to destination still enabling the end-recipient toverify the authenticity and integrity of the received document and theattached or linked files.

SUMMARY OF THE INVENTION

[0015] Accordingly, the main object of the invention is to achieve amethod of authenticating a text document and the files linked thereto sothat the integrity of the document and that all linked files could bechecked individually, while preventing the integrity information frombeing separated or lost thus destroying the integrity of the documentand the linked files.

[0016] The invention relates therefore to a method of authenticating atext document with links to a plurality of files by modifying at least aselected attribute of invisible characters on a plurality of inter-wordintervals of the text document, this method comprising the steps of:

[0017] a) computing a one-way hash function of each file in order toobtain a hash value composed of a subset of hash digits for each one,

[0018] b) encoding each subset of a plurality of subsets of spacecharacters in the text document by replacing in each subset of spacecharacters, the value of the selected attribute for each space characterby a corresponding encoded hash digit of each subset of hash digitscorresponding to each file,

[0019] c) computing the electronic signature of the encoded textdocument by using a public-key algorithm composed of a subset ofsignature digits, and

[0020] d) encoding another subset of space characters in the encodedtext document by replacing the value of the selected attribute for eachspace character by a corresponding encoded signature digit.

[0021] According to a preferred embodiment of the invention, the stepsof encoding includes the steps of transforming the text document intocanonical form by setting on all inter-word intervals of the documentthe value of the selected attribute to the same default value, and foreach file, encoding the hash digits of the hash value corresponding tothe file as an ordered subset of values corresponding to the differentvalues of the selected attribute, selecting a plurality of inter-wordintervals among all inter-word intervals of the text documentcorresponding to a subset of space characters to be used for embeddingthe hash value into the text document, and replacing on each spacecharacter of the subset of space characters, the default attribute valueof this space character by the corresponding encoded hash digit.

BRIEF DESCRIPTION OF THE DRAWINGS

[0022] The above and other objects, features and advantages of theinvention will be better understood by reading the following moreparticular description of the invention in conjunction with theaccompanying drawings wherein:

[0023]FIG. 1A is a flow chart representing the steps of the methodaccording to the invention for authenticating a text document with linksto a plurality of files.

[0024]FIG. 1B is a flow chart representing an alternative of the methodillustrated in FIG. 1A.

[0025]FIG. 2A is a flow chart representing the different steps used inthe step of encoding subsets of space characters within the methodillustrated in FIG. 1A.

[0026]FIG. 2B is a flow chart representing the different steps used inthe step of encoding a subset of space characters within the methodillustrated in FIG. 1B.

[0027]FIG. 3 is a flow chart representing the different steps used inthe step of encoding another substet of space characters using theelectronic signature within the method illustrated in FIGS. 1A and 1B.

[0028]FIG. 4 is a flow chart representing the method for making theauthentication of a text document which has been processed according tothe method illustrated in FIGS. 1A and 1B.

DETAILED DESCRIPTION OF THE INVENTION

[0029] It is assumed that an e-mail text document with links to aplurality N of files is to be authenticated before being sent over theInternet network. Referring to FIG. 1, by means of a one-way hashfunction (e.g. MD5), the authentication program computes the hashfunction of all the files. For this, the hash function of file n (withn=1 to N) is computed (step 10), a test is made to check whether n=N(step 12) and n is incremented by one (step 14) if n has not reached N.

[0030] When the hash function of all files has been computed, N subsetsof space characters respectively associated with the N files are encoded(step 16) using hash values resulting from the hash function computing.Such encoding starts from the first inter-word interval of the documentand a blank space is left for separating the encoded hash digits of twoconsecutive files.

[0031] It must be noted that the encoded document appears identical tothe original document. In fact, when displaying and when printing, thereis not any visually noticeable differences between them. Nevertheless,the input document and the encoded document are different. Using theaction bar of wordPro for selecting “Text Properties”, and moving thecursor over the blanks of the input document, the encoded sequence ofspace character attributes which corresponds to the hash values of thefiles can be seen.

[0032] Then, by means of a public-key algorithm, using the private key,the authentication program computes the electronic signature of thealready encoded document (step 18). Starting from the position of thelast encoded hash value and leaving one blank space for separating thelast groups of encoded hash digits, another subset of space charactersis encoded by using the digits of the electronic signature (step 20).

[0033] It must be noted that the authenticated document is alsoidentical to the original document. In other words, when displayed orprinted, there are not visually noticeable differences between them.However, moving the cursor over the blanks when “Text Properties” ofWordPro has been selected, the encoded sequence of attributes thatcorresponds to the hash values of the files and to the electronicsignature can be seen.

[0034] Note that an alternative of the above method may be used. Insteadof computing the hash function of all the files before encoding subsetof characters with the hash value, the hash function of a file iscomputed (step 22), just before encoding a subset of space characters byusing the hash value resulting from the hash function (step 24). Then,it is checked whether n=N (step 26) and n is incremented by one if it isnot the case (step 28). Finally, the steps of computing the electronicsignature of the document (step 18) and of encoding another subset ofspace characters using the electronic signature (step 20) are the sameones as in the preceding embodiment.

[0035] The method of encoding a subset of space characters (steps 16 and20 in FIG. 1A or steps 24 and 20 in FIG. 1B) is based upon modifyinginvisible parameters of the inter-word or space characters of a textwithout affecting the format and the visual appearance of the originaltext. Such parameters correspond to character attributes including thefont type, text color, italic, bold or protected attributes of the spacecharacters or any combination thereof.

[0036] Assuming that the color attributes of the space characters areselected, a mapping table between such color attributes and the digitsof the hash value may be as follows. ENCODED COLOR VALUE ATTRIBUTE 1GRAY 2 DARK GRAY 3 RED 4 DARK RED 5 YELLOW 6 DARK YELLOW 7 GREEN 8 DARKGREEN 9 CYAN 0 DARK CYAN NONE BLACK

[0037] Note that the color attribute could be combined with anotherattribute such as italic. The selection of the couple formed by textcolor and italic will enable to have as many different choices as thenumber of combinations of colors in the palette of colors and italic/nonitalic.

[0038] The method of encoding is illustrated in FIG. 2A. First, the textdocument where the data is to be embedded is transformed into canonicalform (step 30) by setting on all spaces of the text at least one of theselected attributes to the same default value. Thus, with the selectionof the color attribute, this one is set to the (default) BLACK color forall space characters. In such a case, all space characters have bydefault the WHITE attribute for the background color. Note that settinga default value on any space character means that no information hasbeen encoded on this space. The hash value of each file n (n from 1 toN) is then encoded (step 32) by using the set of encoded attributevalues in the above table to obtain an ordered sequence of attributevalues.

[0039] After having selected an inter-word interval among the inter-wordintervals of the document to be used for encoding (step 34), such aninterval being not already used, the default values of the attributesare replaced by the corresponding encoded attribute values of theordered set of encoded attribute values for each space character of theselected subset of space characters (step 36). Note that the best way isto select consecutive intervals from the beginning of the document.

[0040] A test is then made to check whether the processed file is thelast one, that is whether n=N (step 38). If not, n is incremented by one(step 40) and all the above steps are repeated except the step oftransforming the text document into canonical form. The process is endedwhen the hash value of the last file has been embedded into thedocument.

[0041] In the above example wherein the selected attribute is textcolor, there is no problem to encode data represented in the decimalbase insofar as there are more than 10 colors to represent the decimalfigures 0, 1 . . . 9.

[0042] Assuming that a different attribute is selected wherein there areless than 10 possible choices, such an attribute would not be useful forthe data to be embedded in the decimal base. Even in such a case, itwould be possible to use such an attribute provided that the data isrepresented according to a numerical base N lesser than the number ofdifferent possible attribute values. Thus, if there are 5 differentpossible choices for the selected attribute, the data will berepresented in the 5-base with figures 0-4. Of course, such arepresentation of the data requires to reserve more spaces in the textdocument for encoding information than by using, for instance, a decimalbase.

[0043] Another possibility to use an attribute allowed to take only afew number of different values is to combine it with another attribute.As an example, the above attribute taking 5 values could be combinedwith another attribute, such as italic/non italic, having two possiblechoices, to represent the 10 figures (0 to 9) of the data encoded in thedecimal base.

[0044] For example, the following correspondence or mapping tableassociates a pair of attributes, for instance the color attribute andthe italic/non-italic attribute, to hexadecimal digits: ENCODED COLORITALIC VALUE ATTRIBUTE ATTRIBUTE 0 CYAN NO 1 DARK CYAN NO 2 RED NO 3DARK RED NO 4 YELLOW NO 5 DARK YELLOW NO 6 GREEN NO 7 DARK GREEN NO 8CYAN YES 9 DARK CYAN YES A RED YES B DARK RED YES C YELLOW YES D DARKYELLOW YES E GREEN YES F DARK GREEN YES NONE BLACK Don't care

[0045] If the alternative method illustrated in FIG. 1B is used, theencoding step represented in FIG. 2B includes the same substeps. Indeed,after the text document has been transformed into canonical form aspreviously (step 30), the encoding step consists, as previously, inencoding the hash value of file n (step 32), selecting an inter-wordinterval in the text document different from the intervals already used(step 34) and replacing the default attribute values of space charactersof the selected interval by the encoded hash digits.

[0046] Whatever the method being used, the step of encoding anothersubset of space characters using the electronic signature illustrated inFIG. 3, consists in encoding the electronic signature by using the setof attribute values in the above table to obtain an ordered sequence ofattribute values (step 42), selecting a subset of space characters inthe document different from the interval already used for encoding thefiles (step 44) and replacing the default attribute values of thissubset of space characters by the encoded signature digits (step 46).

[0047] Now, assuming that the encoded document with the linked files isreceived by e-mail, the method of authentication illustrated in FIG. 4is the following. First, the invisibly encoded information is recoveredfrom the received document (step 50) by decoding the encoded attributesin the inter-word intervals which have been used for encoding. Note thatthe encoded space characters are different from the not encoded spacecharacters the attributes of which have been set to the same defaultvalue. Thus, a value S is recovered for the electronic signature andvalues H₁ . . . H_(N) are recovered for the hash values of the N files.

[0048] The encoded digits of the recovered value S are then removed fromthe document (step 52). Thus, the new document always includes theinvisibly encoded values H₁ . . . H_(N), but appears identical to thereceived document whereas the two documents are different.

[0049] Using the new document from which the encoded electronicsignature has been removed, an electronic signature S* is computed (step54) by means of the same public-key algorithm. Then, a test is made tocheck whether the values S and S* are identical (step 56). If not, thedocument is rejected (step 58). If so, there is authentication of thereceived document (step 60).

[0050] Then, by means of the same one-way hash function (e.g. MD5) usedby the encoding program when the document was sent, the verificationprogram computes the hash values H₁*, H₂*, . . . H_(N)* of the linkedfiles (step 62). A test is then made to check whether the recovered hashvalue Hn and the computed hash value Hn* are identical for each file n,n being 1 to N (step 64). If not, the received file must be rejected(step 66). If so, this means that there is authentication of file n(step 68). Finally, n is incremented by one (step 70) until all fileshave been checked.

[0051] The above authentication method being protocol and data formatindependent can be applied to many different software packages such ase-mail systems that generate textual documents that contain links to alltypes of files. Also, a Web page such as an HTML document that containshyperlinks to other web pages can be authenticated and the integrity ofsaid hyperlinks be checked by using this method.

[0052] It must be noted that, in any case, communication systemsexchanging text documents in electronic form (soft copy) must becompatible for using the invention. It is so for almost all modernoffice and e-mail products. It is also important to note that, even if asystem does not support colors (but only black and white texts), itwould be even possible to encode invisible information on the blanks ofa plain text by using for encoding one or a combination of severaldifferent possible attributes, like the font type, italic, bold orprotected attributes.

1. A method of authenticating a text document with links to a pluralityof files by modifying at least a selected attribute of invisiblecharacters on a plurality of inter-word intervals of said text document;said method comprising the steps of: a) computing (step 10 or 22) aone-way hash function of each file of said plurality of files in orderto obtain a hash value composed of a subset of hash digits for each one,b) encoding (step 16 or 24) each subset of a plurality of subsets ofspace characters in said text document by replacing in each subset ofspace characters, the value of said selected attribute for each spacecharacter by a corresponding encoded hash digit of each subset of hashdigits corresponding to each one of said files, c) computing (step 18)the electronic signature of the encoded text document by using apublic-key algorithm composed of a subset of signature digits, and d)encoding (step 20) another subset of space characters in said encodedtext document by replacing the value of said selected attribute for eachspace character by a corresponding encoded signature digit.
 2. Themethod according to claim 1, wherein said step b) includes the steps of:b₁) transforming (step 30) said text document into canonical form bysetting on all inter-word intervals of said document the value of saidselected attribute to the same default value, and for each file: b₂)encoding (step 32) the hash digits of the hash value corresponding tosaid file as an ordered subset of values corresponding to the differentvalues of said selected attribute, b₃) selecting (step 34) a pluralityof inter-word intervals among all inter-word intervals of said textdocument corresponding to a subset of space characters to be used forembedding said hash value into said text document, and b₄) replacing(step 36) on each space character of said subset of space characters,the default attribute value of this space character by the correspondingencoded hash digit.
 3. The method according to claim 2, wherein saidstep d) includes the steps of: d₁) encoding (step 42) the signaturedigits as an ordered subset of values corresponding to the differentvalues of said selected attribute, d₂) selecting (step 44) said anothersubset of space characters as being a subset of inter-word intervalsdifferent from any one of said plurality of inter-word intervals, andd₃) replacing (step 46) on each space character of said another subsetof space characters the default attribute value of the space characterby the corresponding encoded signature digit,
 4. The method according toclaim 2 or 3, wherein said step of encoding (step 32 or 42) the hashdigits or the signature digits consists in using a set of attributevalues which are encoded by establishing a correspondence table betweensaid attribute values and said digits.
 5. The method according to claim4, wherein said digits to be encoded are a sequence of figures which canbe each one of figures 0, 1, 2, . . . , N-1 in the N base, said figurescorresponding respectively to N selected attribute values.
 6. The methodaccording to claim 5, wherein said selected attribute is the charactercolor, said attribute values corresponding to N different colors whichcan be selected for the color attribute.
 7. The method according toclaim 6, wherein said digits to be encoded are represented by decimalfigures in the decimal base (N=10), each figure 0 to 9 being associatedrespectively to a color defined by the character color attribute.
 8. Themethod according to claim 5, 6 or 7, wherein two attributes are used incombination so that each of said figures 0, 1, 2, . . . N-1 in the Nbase corresponds respectively to a combination of a selected value of afirst attribute and a selected value of a second attribute.
 9. Themethod according to claim 8, wherein said second attribute is the“italic” format of a character, the attribute value corresponding to“italic” or “non italic”.
 10. A method of doing the authentication of atext document with links to a plurality of N files received by acommunication system wherein said text document includes invisibleauthentication data which have been incorporated in said document bymodifying selected invisible attributes on the space characters by usingthe method according to any one of the claims 1 to 9, said methodcomprising the steps of: transforming (26) said text document intocanonical form by setting on all inter-word intervals of said receiveddocument the values of said selected attributes to the same defaultvalue, recovering invisibly encoded data composed of an originelectronic signature and a plurality N of origin hash valuescorresponding to said files, said invisibly encoded data correspondingto predefined subsets of space characters wherein the values of saidselected attribute are different from a same default value, removing therecovered value of said electronic signature from the received documentto obtain a new document, computing a new electronic signature from saidnew document by using the same public-key algorithm being used when thedocument has been encoded, comparing said new electronic signature tosaid origin electronic signature, and if said new electronic signatureis identical to said origin electronic signature, computing a one-wayhash function of each of said files in order to obtain a new hash valuefor each one, and comparing the new hash value to the origin hash valuefor each file n of said N files with n being 1 to N in order toauthenticate said file n.
 11. The method according to any one of thepreceding claims wherein said document is a document sent by e-mail overthe Internet network.
 12. A system comprising means adapted for carryingout the steps of the method according to claims 1 to
 11. 13. A computerprogram product comprising a computer usable medium having computerreadable program code means for carrying out the method according to anyone of claims 1 to 11.